AbCD: arbitrary coverage design for sequencing-based genetic studies

نویسندگان

  • Jian Kang
  • Kuan-Chieh Huang
  • Zheng Xu
  • Yunfei Wang
  • Gonçalo R. Abecasis
  • Yun Li
چکیده

Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5-30×) and sample size (20-10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing Genomewide Association Study (GWAS)].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Population genetic studies of Liza aurata using D-Loop sequencing in the southeast and southwest coasts of the Caspian Sea

Genetic diversity as an important marker of the ecological status of aquatic ecosystems is considered a unique and powerful tool to evaluate biological communities. In order to evaluate the genetic diversity among golden mullet species (Liza aurata) in the southeast and southwest coasts of the Caspian Sea by D-Loop gene sequencing, a total of 23 fin specimens of golden mullet were collected fro...

متن کامل

Population genetic studies of Liza aurata using D-Loop sequencing in the southeast and southwest coasts of the Caspian Sea

Genetic diversity as an important marker of the ecological status of aquatic ecosystems is considered a unique and powerful tool to evaluate biological communities. In order to evaluate the genetic diversity among golden mullet species (Liza aurata) in the southeast and southwest coasts of the Caspian Sea by D-Loop gene sequencing, a total of 23 fin specimens of golden mullet were collected fro...

متن کامل

Genotype Calling from Population-Genomic Sequencing Data

Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we ha...

متن کامل

An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.

Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains se...

متن کامل

Design of association studies with pooled or un-pooled next-generation sequencing data.

Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 6  شماره 

صفحات  -

تاریخ انتشار 2013